Big Macs and Eigenfactor Scores: Don't Let 
Correlation Coefficients Fool You 
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Abstract 

The Eigenfactor™ Metrics provide an alternative way of evaluat- 
ing scholarly journals based on an iterative ranking procedure anal- 
ogous to Google's PageRank algorithm. These metrics have recently 
been adopted by Thomson-Reuters and are listed alongside the Im- 
pact Factor in the Journal Citation Reports. But do these metrics 
differ sufficiently so as to be a useful addition to the bibliometric tool- 
box? Davis (2008) has argued otherwise, based on his finding of a 0.95 
correlation coefficient between Eigenfactor score and Total Citations 
for a sample of journals in the field of medicine [B]. This conclusion 
is mistaken; here we illustrate the basic statistical fallacy to which 
Davis succumbed. We provide a complete analysis of the 2006 Jour- 
nal Citation Reports and demonstrate that there are statistically and 
economically significant differences between the information provided 
by the Eigenfactor Metrics and that provided by Impact Factor and 
Total Citations. 
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Spurious correlations have been ruining empirical 
statistical research from times immemorial. 

Jerzy Neyman, 1972 [8] 

1 Big Macs and Correlation Coefficients 

One might think that if the correlation coefficient between two variables is 
high, those variables convey the same information, and thus can be used 
interchangably — but this line of reasoning is erroneous. A simple example 
helps to illustrate. In Table [T] we provide two statistics for each of 22 
countries: the cost of a Big Mac in local currency, and the mean hourly wage 
in local currency. The Pearson product-moment correlation coefficient, p, 
between these two statistics is 0.99. Since p is nearly 1, one might conclude 
that we can use hourly wages to predict burger prices with high accuracy and 
one might question why anyone should waste his or her time collecting burger 
price information if the hourly wage rates are already known. But take a 
look at the column "Real Wage" . The real wage — the ratio of burger prices 
to hourly wages — is the variable of economic interest, since it measures a 
worker's purchasing power. We see that real wages differ dramatically across 
countries. In Denmark, a worker making the mean hourly wage need only 
work for seven minutes to earn a Big Mac, whereas in China, a worker 
making the mean hourly wage must work for nearly two hours to afford a 
burger. 

In our hamburger example, it is pretty clear what is going on. The de- 
nominations of currencies vary immensely and arbitrarily. It is indeed true 
that differences in real wages are small relative to differences in currency 
denominations. But it is not true that after correcting for differences in 
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Country 


Burger Price 


Hourly Wage 


Real Wage 


Denmark 


24.75 


211.13 


8.53 


Australia 


3.00 


19.86 


6.62 


New Zealand 


3.60 


21.94 


6.09 


Switzerland 


6.30 


37.85 


6.01 


United States 


2.54 


14.32 


5.64 


Britain/UK 


1.99 


11.15 


5.60 


Germany 


2.61 


14.32 


5.49 


Canada 


3.33 


16.78 


5.04 


Singapore 


3.30 


15.65 


4.74 


Swod prt 


24.00 


110.90 


4.62 


Hong Kong 


10.70 


44.26 


4.14 


Spain 


2.37 


8.59 


3.62 


South Africa 


9.70 


30.86 


3.18 


France 


2.82 


8.50 


3.01 


Poland 


5.90 


11.80 


2.00 


Hunearv 


399.00 


704.34 


1.77 


v^zecn xvep. 


dU.UU 
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Brazil 


3.60 


4.58 


1.27 


South Korea 


3000.00 


3134.00 


1.04 


Mexico 


21.90 


17.61 


0.80 


Thailand 


55.00 


31.69 


0.58 


China 


9.90 


5.56 


0.56 


mean 


166.01 


207.32 


3.72 


std. dev. 


638.49 


670.63 


2.29 


std. dev. /mean 


3.85 


3.23 


0.62 



Table 1: Hourly Wage versus Real Wage. Burger price and hourly wage 
are in the local currency. Burger price is the average cost of a Big Mac. 
The units for Real Wage are burgers per hour. Data comes from Behar's 
"Who earns the most hamburgers per hour?" [3j . The correlation coefficient 
between burger price and hourly wage is p = 0.99. 
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denominations, differences in real wages are negligible. One way to think of 
this is that the greatest part of the variation in hourly wage comes from the 
relatively unimportant fact that currency is denominated differently in dif- 
ferent countries. The standard deviation of hourly wages in nominal terms 
is about 300 times as large as that in real terms. Although the standard 
deviation of real wages across countries is tiny compared to that of nominal 
exchange rates, this variation is far more important for the quality of life 
of workers. Thus, one would be wrong to conclude from the high correla- 
tion coefficient that the real wage is constant across countries. Quite the 
contrary; the standard deviation of this ratio is 62% of the mean. 

2 Davis's analysis 

Davis (2008) fell into a similar trap in his recent comparison of journal 

rankings by Eigenfactor score and by Impact Factor or Total Citations [6]. 

In that paper, Davis aimed to determine whether measures of "popularity" 

such as Impact Factor and total citation differ substantially from measures 

of "prestige" such as the journal PageRank [5 J and the Eigenfactor metrics 

[Sj^j To do so, Davis conducted a regression analysis of Eigenfactor scores 

1 The same issue was the subject of a more comprehensive analysis by Bollen and 
colleagues in 2006 [5|. In that paper, Bollen and colleagues compare weighted PageRank 
with Impact Factor and with Total Citations to explore differences between popularity 
and prestige. Weighted PageRank and Eigenfactor are both variants of the PageRank 
algorithm. See also Pinski and Narin (1976) for an early attempt at constructing prestige- 
based measures using citation data, and Vigna (2009) for a discussion of how Pinski and 
Narin's measure differs from current approaches [101 112] . 
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on Total Citations^] for a set of 165 medical journals^} Davis reports that the 
correlation coefficient between 2006 Eigenfactor scores and Total Citations^] 
is p = 0.9493. Based on this result, Davis concluded that: 

"At least for medical journals, it does not appear that iterative 
weighting of journals based on citation counts results in rankings 
that are significantly different from raw citation counts. Or, 
stated another way, the concepts of popularity (as measured by 
total citation counts) and prestige (as measured by a weighting 
mechanism) appear to provide very similar information." 

But is Davis right? Is it really the case that if you know the number of 
citations, you would be wasting your time by finding the Eigenfactor score? 
Not at all. 



2 In his paper Davis also looked at the correlation coefficient between Eigenfactor and 

Impact Factor scores. This p value is lower (p — 0.86), but the point is not so much 

what this value is, but rather that the comparison makes little sense. Eigenfactor is a 

measure of total citation impact, and should (all else equal) scale with the size of the 

journal. Impact factor is a measure of citation impact per paper, and all else equal should 

be independent of journal size. If one wants to compare an Eigenfactor metric with the 

Impact Factor, one should use the Article Influence Score, which is a per-article measure 

like Impact Factor. We explore this comparison later in the paper. 

3 Contrary to what is specified in that paper, Davis appears to have sampled from both 

the "Medicine General and Internal" and "Medicine Research and Experimental" fields, 

not merely the former category. In our analysis of the same subfields of medicine, we 

included 168 journals (of the 171 journals in this field); we eliminated 3 journals because 

they had an Impact Factor and/or Article Influence score of zero 

4 Davis appears to have used citations (from year 2006) to all articles published in 

the journals he selected. A cleaner comparison, which would have resulted in a higher 

correlation, would have been to extract citations (from year 2006) to articles published in 

the past five years, since the Eigenfactor score takes into account only the past 5 years' 

citations. 
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First, Davis made a classic statistical error — cautioned against by Karl 
Pearson in 1897 — in comparing two measures with a common factor [9]. 
Second, Davis suggests that a high correlation coefficient implies that there 
is no significant difference between two alternative measures; this is simply 
false. We address these issues in turn. 

3 Journal Sizes and Spurious Correlations 

There are enormous differences in the size of academic journals, and these 
differences swamp the patterns that Davis was seeking in his analysis. The 
JCR indexes journals that range in size from tiny (Astronomy and Astro- 
physics Review has published 13 articles over the previous five years) to huge 
(The Journal of Biological Chemistry has published 31,045 articles over the 
same period) with a coefficient of variation, c„, equal to 1.910. Per-article 
citation intensity varies less, whether measured by Article Influence or by 
Impact Factor (AI: range 0-27.5, coefficient of variation= 1.785; IF: range 
0-63.3, coefficient of variation= 1.548). 

We can formalize these observations by decomposing Davis' regression 
of Eigenfactor on Total Citations. Davis regresses 

Log(EFi) vs Log(CT-), 

where EFi is the Eigenfactor score for journal i and CTj is the Total Cita- 
tions received by journal i. We let Ali be the Article Influence for journal 
i, and A^5 is the total number of articles published over the last five years 
for journal i. Then by definition 

log(EF l ) = log(ci x Ali x Ni, s ) 
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= log ci + log Ali + log 7V" ii5 , 

where c\ is a scaling constant that normalizes the Article Influence scores 
so that the mean article in the JCR has an Article Influence score of 1.00. 
Similarly, letting IFi be the Impact Factor for journal i, 

log(CT t ) w log(c 2 x IFi x JVi, 2 ) 

w log(c 2 c 3 x IFi x iV ij5 ) 

= log c 2 c 3 + log /Fj + log N ij5 

where c 2 and C3 are additional scaling constants. The scaling constant, c 2 , 
accounts for the fact that Davis compared citations for all years and not just 
citations for 2 years. The scaling constant C3 relates the number of articles 
published in two years to the number of articles published in five years (and 
thus is approximately 5/2). As a result, Davis is effectively calculating a 
regression between 

log(Article Influence) + log(Total Articles) 

and 

log(Impact Factor) + log(Total Articles). 

Having the "log(Total Articles)" term on both sides of the regression - 
especially given that it varies more than the other two terms — obscures 
the relation between the variables that one would actually wish to observe 
when trying to evaluate the difference between "popularity" and "prestige" . 

This pitfall is famous in the history in mathematical statistics. In 1897, 
two years after pioneering statistician Karl Pearson developed the product- 
moment correlation coefficient, he presented a paper to the Royal Society 
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in which he noted that fellow biometrician W. F. R. Weldon had made 
precisely this mistake in the analysis of body dimensions of crustaceans 
[HIE]. Explaining this error, Pearson wrote 

"If the ratio of two absolute measurements on the same or differ- 
ent organs be taken it is convenient to term this ratio an index. 
If u = fi(x, y) and v = f-2,(z, y) be two functions of the three vari- 
ables x, y, z, and these variables be selected at random so that 
there exists no correlation between x,y, y,z, or z,x, there will still 
be found to exist correlation between u and v. Thus a real dan- 
ger arises when a statistical biologist attributes the correlation 
between two functions, like u and v to organic relationship." 

It was to describe this danger that Pearson coined the term spurious cor- 
relation [1]. He imagined a set of bones assembled at random. Based on 
correlations between measurements that share a common factor, a biologist 
could easily make the mistake of concluding that the bones were properly 
assembled into their original skeletons: 

"For example, a quantity of bones are taken from an ossuarium, 
and are put together in groups, which are asserted to be those 
of individual skeletons. To test this a biologist takes the triplet 
femur, tibia, humerus, and seeks the correlation between the in- 
dices femur/humerus and tibia/humerus. He might reasonably 
conclude that this correlation marked organic relationship, and 
believe that the bones had really been put together substan- 
tially in their individual grouping. As a matter of fact, since 
the coefficients of variation for femur, tibia, and humerus are 
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approximately equal, there would be, as we shall see later, a cor- 
relation of about 0.4 to 0.5 between these indices had the bones 
been sorted absolutely at random. I term this a spurious or- 
ganic correlation, or simply a spurious correlation. I understand 
by this phrase the amount of correlation which would still exist 
between the indices, were the absolute lengths on which they 
depend distributed at random." 

The reason for this correlation will be that some of the random femur 
and tibia pairs will be combined with a large humerus; in this case both the 
femur/humerus and tibia/humerus ratio will tend to be smaller than average. 
Other femur and tibia pairs will be combined with a small humerus; in this 
case both the femur/humerus and tibia/humerus ratio will tend to be larger 
than average. Correlation coefficients of the two ratios give the illusion that 
tibia and femur length covary, even when they in fact do not. For his part, 
Weldon was forced to concede that nearly 50% of the correlation he had 
observed in body measurements was actually due to this effect. 

Just over a decade later, another important figure in the development of 
mathematical statics, G. U. Yule, noted that when absolute values share a 
common factor, they are just as susceptible to this problem as are "indices" 
or ratios |15| : 

"Suppose we combine at random two indices z\ and Z2, e.g. two 
death-rates, and also combine at random with each pair a de- 
nominator or population X3. The correlations between zi, 22, 
and X3 will then be zero within the limits of sampling. But now 
suppose we work out the total deaths x± = Z1X3 and 22 = Z2X3] 
the correlation r\i between x\ and X2 will not be zero, but pos- 
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itive." 



This is precisely the form of spurious correlation that arises in Davis's 
analysis. Per-article popularity as measured by Impact Factor takes the 
role of z\ in Yule's example, and per-article prestige as measured by Article 
Influence score takes the role of Zi- Total Articles takes the role of Yule's 
x%. Even if Impact Factor and Article Influence were entirely uncorrelated, 
Davis still would have observed a high correlation coefficient in his regression 
of Eigenfactor and Total Citations (~ p = 0.6 for all journals), because both 
share number of articles as a common factor. What Davis discovered is 
not that popularity and prestige are the same thing; he discovered that big 
journals are big and small journals are small. Because of this wide variation 
in journal size, one would also observe a high correlation coefficient between 
pages and total cites, though very few would argue that the former is an 
adequate surrogate for the latteij^J 

To avoid this problem, we might want to look at the correlation between 
popularity per article and prestige per article. That is, we need to look at 
the comparison 

Log( Article Influence) vs. Log(Impact Factor). 

Since its inception in January 2007, Eigenfactor.org has provided exactly 
this information at http://www.eigenfactor.org/correlation7} for the 
entire JCR dataset and also for each individual field of scholarship as de- 
fined by the Figure [l] is a histogram of the correlation coefficients 

5 We collected page and citation information for 149 Economics journals in 2006. The 

correlation coefficient between total pages and total citations is p — 0.615. 

6 Falagas et. al (2008) presented a similar comparison of Impact Factor and the SIR 

indicator (a per-article measure of prestige) [7]. Waltman and van Eck look at a corre- 
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between Impact Factor and Article Influence scores for all 231 categories in 
the 2006 JCR. The mean for all fields was 0.853 with a standard deviation 
of 0.099. The field with the lowest correlation coefficient is Communication 
(p = 0.478). Marine Engineering has the highest correlation (p = 0.986). 
The sample of medical journals that Davis selected, with p = 0.954, ranks in 
the 90th percentile when compared to all 231 fields. Correlation coefficients 
within fields typically exceed the correlation coefficient for all journals to- 
gether. For all 7,611 journals considered together, p = 0.818. This value 
is lower than the mean of individual-field correlation coefficients, which is 
p = 0.853. 

4 Correlation and significant differences 

To evaluate Davis's claim that Eigenfactor score and Total Citations are 

telling us the same thing, we can focus on the ratio of Eigenfactor score to 

Total Citations (EF/TC). (When we look at the ratio, the common factor 

"Total Articles" divides out.) Notice that a journal's EF/TC ratio is a 

measure of "bang per cite received" - that is, how much Eigenfactor boost 

does this journal receive, on average, when it is cited. In the hamburger 

example, the corresponding notion is "burgers per hour," the real wage or 

purchasing power of an hour's work. Does a high correlation between Total 

Citations and Eigenfactor score mean that the bang per cite received is about 

constant? If it is, there really would be no point to looking at Eigenfactor 

scores instead of Total Citations. So let's see what happens. 

Figure [2] shows the ratio of Eigenfactor score to Total Citations for every 

lations among a number of bibliometric measures; their discussion of differences between 
Impact Factor and Article Influence is noteworthy [13] . 
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Figure 1: Histogram of correlation coefficients between Impact Factor and 
Article Influence scores. This includes all 231 categories in the 2006 Science 
and Social Science JCR. The mean of all fields is 0.853 (infra-field mean) and 
the standard deviation is 0.099. The correlation for all journals considered 
together is 0.818. The correlation for the field of Medicine as studied by 
Davis is 0.954. The correlation coefficients for all fields can be found at 
http : /www . eigenf actor . org/correlation/ . 
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Journals 



Figure 2: Ratio of Eigenfactor score to Total Citations. Data are normalized 
by the median ratio of the data set. The dashed line indicates a ratio of one. 
The journals are ordered from those with the highest ratio to the lowest. The 
inset shows only the 168 medical journals from Davis's analysis. 
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journal in the JCR, and the insert shows just the medical journals. The 
standard deviation of this ratio is 1.1 x 10~ 5 and the mean is 1.56 x 10 -5 . 
The standard deviation, in this case, is 71% of the mean. This is even more 
variable than the Big Mac case! Moreover, there are nearly 1000 journals 
with twice the mean "bang per cite" . 

The thing to notice in both the Big Mac and the journal example is that 
if you are interested in the ratio of A to B and if A = ax and B = bx 
for some x with a very high variance relative to that of a and of b, you 
will get a very high p value when you regress B on A. However, if what 
really interests you is the ratio A/B, you will note that the x's cancel and 
A/B = ax/bx = a/b. Thus, the variance of x has literally nothing to tell 
you about the variance of the ratio a/b. You don't learn about whether a/b 
is nearly constant or highly variable from looking at the correlation of B on 
A. 

If, as Davis claims, Eigenfactor scores do not differ significantly from 
Total Citation counts, the ratio EF/TC should be constant across different 
groups of journals. To evaluate this claim, we look at the EF/TC ratios of 
social journals with those of science journals, with groupings determined by 
whether a journal is listed in the Social Science JCR or the Science JCR. 
(Journals listed in both are omitted from the analysis). The mean EF/TC 
ratio for science journals is 1.42 x 10~ 5 , whereas the mean for social science 
journals is 2.12 x 10~ 5 . A Mann- Whitney U test shows that this difference 
is highly significant, at the p < 10~ 167 level. 

These differences are not only statistically significant, but also econom- 
ically relevant. The 49% difference in mean EF/TC ratios indicates that a 
librarian who uses Total Citations to measure journal value will underesti- 
mate the value of social science journals by 49% relative to a librarian who 
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uses Eigenfactor scores to measure value. 

There are also significant differences within the sample of journals that 
Davis considered. Based on the difference between science and social sci- 
ence ratios described above, one might expect medical journals more closely 
associated with the social sciences, such as those in public health, to have 
higher-than-average EF/TC ratios. Seven of the publications in Davis's 
sample of medical journals are cross-listed in the JCR category of public, 
environmental, and occupational health. Indeed, this group of journals has 
a 29% higher EF/TC ratio than do the rest of the journals in Davis's sample, 
again statistically significant (Mann- Whitney U test, p < .01). 

Note that there is nothing special about this particular comparison be- 
tween sciences and social sciences; one could test any number of alternative 
hypotheses and would find significant differences between EF/TC ratios for 
many other comparisons as well. 

5 The value of visualization 

So, if correlation coefficients are misleading, what is the alternative? First, 

we argue for a deeper examination of the data. Figure [3] is an example of 

this strategjj^] Listing the journals in this way, one is able to quickly see 

the ordinal differences that exist between this highly correlated data. This 

7 Figure [3] caption: Journal ranking comparisons by Total Citations and Eigenfactor 
score. The journals listed are the top 50% from the field of Medicine that Davis analyzed. 
Journals in the left column are ranked by Total Citations for all years. Journals in the 
right column are ranked by Eigenfactor score. The lines connecting the journals indicate 
whether the journal moved up (green), down (red) or stayed the same (black) relative to 
their ranking by Total Citations. Journal names in black can also be journals that do not 
exist in both columns. 
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type of graphical display illustrates the interesting stories that can be lost 
behind a summary statistic such as the Spearman correlation. 

Figure [3] illustrates the ordinal ranks of the top 50% of the medical 
journals used in Davis's study. In the left column, the journals in this 
subfield of medicine are ranked by the total number of citations. In the right 
column, the journals are ordered by Eigenfactor score. The lines connecting 
the journals indicate whether the journal moved up (green), down (red) or 
stayed the same (black) relative to their ranking by Total Citations. The 
figure highlights the differences between the metrics. For example, Aviation 
Space and Environmental Medicine drops 30 places while PLoS Medicine 
raises 31 places. Davis claims in his paper that the ordering of journals does 
not change drastically. Figure [3] suggests otherwise. 

Figure [4] compares the ordinal ranking by Impact Factor and Article 
Influence for 84 journals — the top-ranked half — from Davis's stud)j^] 
Changes in ranking are even more dramatic when we look at the lower- 
ranked 84 journals. The correlation coefficient between Impact Factor and 
Article Influence for these 84 journals is p = 0.955. Despite this high corre- 
lation, the figure highlights the fact that the two metrics yield substantially 
different ordinal rankings. 

s Figure [4] caption: Comparing Impact Factor and Article Influence. The journals 
shown are from the same field that Davis analyzed (because of limited space, only the top 
84 journals are shown). For these 84 journals, the correlation coefficient between IF and 
AI is p — 0.955. The relative rankings by Impact Factor and Article Influence are listed 
in the left and right column, respectively. The third column lists the Article Influence 
scores. The journal names in green indicate those that fare better when ranked by Article 
Influence; the journal names in red fare better when ranked by Impact Factor. The names 
in black are journals that exhibit no change or exist outside the range of the journals 
shown. 
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Total Citations 

NEW ENGL J MED . 

LANCET . 
JAMA-J AM MED ASSOC - 
J CLIN INVEST - 
J EXP MED ■ 
BRIT MED J ■ 
NAT MED ■ 
ANN INTERN MED - 
ARCH INTERN MED - 
AM J MED - 
LIFE SCI - 
VACCINE - 
LARYNGOSCOPE . 
LAB INVEST . 
GENETHER . 
STAT MED . 
CAN MED ASSOC J . 
ADV EXP MED BIOL ■ 
MAYO CLIN PROC . 
MED J AUSTRALIA . 
HUM GENETHER ■ 
PREV MED ■ 
CLIN SCI . 
ARCH PATHOL LAB MED . 

MOLTHER ■ 
J GEN INTERN MED - 
AM J PREV MED - 
J INTERN MED - 
EXP HEMATOL - 
J LAB CLIN MED , 
EUR J CLIN INVEST . 

MEDICINE . 
QJM-INT J MED . 
SOUTH MED J , 
J PAIN SYMPTOM MANAG ■ 
J MOL MED-JMM ■ 
ANNU REV MED ■ 
INT J MOL MED - 
AM J MED SCI , 
TRENDS MOL MED - 
MED HYPOTHESES . 

ANN MED - 
J FAM PRACTICE , 
POSTGRAD MED J . 
CONTROL CLIN TRIALS . 
AM FAM PHYSICIAN . 
MOL GENET METAB - 
MIL MED , 
BRAZ J MED BIOL RES . 
CANCER GENETHER • 
BRIT J GEN PRACT ■ 
AVIAT SPACE ENVIR MD , 
SCAND J CLIN LAB INV 
CURR MED RES OPIN - 
J GENE MED - 
BRIT MED BULL . 
MED SCI MONITOR ■ 
MED CLIN-BARCELONA 
EXP BIOL MED - 
SAMJSAFRMED J 
MOL MED . 
PLOS MED ' 
FAM PRACT - 
CHINESE MED J-PEKING . 

J R SOC MED 
MED CLIN NAM ■ 
DEUT MED WOCHENSCHR 
INT J CLIN PRACT - 
PRESSE MED 
J IMMUNOTHER - 
INTERNAL MED ■ 
BIOMED PHARMACOTHER - 
INDIAN J MED RES 
PALLIATIVE MED . 
CURR MOL MED ' 
AM J MANAG CARE • 
MELANOMA RES . 
WIEN KLIN WOCHENSCHR 
J NATL MED ASSOC ■ 
EXPERT OPIN BIOL TH - 
WOUND REPAIR REGEN - 
J CELL MOL MED - 
J BIOMED SCI - 
J KOREAN MED SCI 



Eigenfactor 

. NEW ENGL J MED 
. LANCET 

■ JAMA-J AM MED ASSOC 

■ J EXP MED 

■ J CLIN INVEST 

■ NAT MED 

■ BRIT MED J 

■ ANN INTERN MED 

■ ARCH INTERN MED 

■ VACCINE 

■ AM J MED 

■ LIFE SCI 

. MOLTHER 
. GENETHER 

■ LARYNGOSCOPE 
. STAT MED 

. AM J PREV MED 

■ CAN MED ASSOC J 

■ J GEN INTERN MED 

■ LAB INVEST 

. EXP HEMATOL 

, TRENDS MOL MED 

■ HUM GENETHER 

■ PREV MED 

■ MAYO CLIN PROC 

■ J INTERN MED 

■ MED J AUSTRALIA 

■ ADV EXP MED BIOL 
. J MOL MED-JMM 

■ CLIN SCI 

, PLOS MED 

. ANNU REV MED 

■ ARCH PATHOL LAB MED 
. MOL GENET METAB 

■ EUR J CLIN INVEST 
, EXP BIOL MED 

■ ANN MED 

, J GENE MED 

■ INT J MOL MED 

. CURR MED RES OPIN 
. CANCER GENETHER 
, CURR MOL MED 

■ J PAIN SYMPTOM MANAG 

■ QJM-INT J MED 

. MED SCI MONITOR 
. BRIT J GEN PRACT 

■ MEDICINE 

. AM FAM PHYSICIAN 
. CONTROL CLIN TRIALS 
, AM J MANAG CARE 
. FAM PRACT 
, J IMMUNOTHER 

■ MED HYPOTHESES 

, EXPERT OPIN BIOL TH 
. INT J CLIN PRACT 
J URBAN HEALTH 
, J CELL MOL MED 
i J LAB CLIN MED 
. AM J MED SCI 
i SOUTH MED J 

■ POSTGRAD MED J 

■ BRIT MED BULL 

■ BRAZ J MED BIOL RES 

■ J FAM PRACTICE 

J ENDOTOXIN RES 

■ MOL MED 

■ CHINESE MED J-PEKING 

■ BIOMED PHARMACOTHER 
. J BIOMED SCI 

CURR OPIN MOLTHER 

■ MIL MED 

. WOUND REPAIR REGEN 

ANN FAM MED 
. PALLIATIVE MED 

■ INTERNAL MED 

■ MED CLIN NAM 

. MELANOMA RES 
FAM MED 

CANCER BIOTHER RADIO 
EXP MOL MED 

■ J NATL MED ASSOC 

■ AVIAT SPACE ENVIR MD 
J BONE MINER METAB 
ARCH MED RES 



0.7183 

0.5002 

0.4549 

0.2981 

0.2916 

0.2651 

0.2060 

0.1364 

0.1149 

0.05978 

0.05663 

0.04394 

0.03787 

0.03574 

0.0316 

0.03089 

0.02895 

0.02892 

0.02829 

0.02736 

0.02637 

0.02551 

0.02500 

0.02419 
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Figure 3: See footnote in text for caption. 
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Figure 4: See footnote in text for caption. 



Figure [4] reveals that the top few journals change in rank less than those 
further down the hierarchy. For example, going from Impact Factor to 
Article Influence, the journals in the top ten change in rank by only 1 or 
2 positions. By contrast, there are many larger changes further on in the 
rankings^} For example, as we go from Impact Factor to Article Influence, 
the Journal of General Inernal Medicine rises 18 spots to number 19 while 
Pain Medicine drops 35 spots to end up at number 80. These are just two 
of the many major shifts (in a field with a correlation of 0.955!). These 
changes in relative ranking would certainly not go unnoticed by editors or 
publishers. 

Furthermore, while ordinal changes are interesting, cardinal changes are 
often more important. Figure [5] shows the top ten journals from Figure [3]— 
those with the least ordinal change from one metric to another — now in 
their cardinal positions. Even those journals that do not change ordinal rank 
from one metric to another may be valued very differently under the two 
different metrics. For example, Nature Medicine is the #2 journal regardless 
of whether one uses Impact Factor or Article Influence. But under Impact 
Factor, it has barely half the prestige of the first-place New England Journal 
of Medicine, whereas by Article Influence it makes up a good deal of that 
ground. 

9 Bollen (2006) observed a similar pattern in a series of scatterplots contrasting PageR- 
ank and Impact Factor values for all journals [5]. In these scatterplots the rankings of 
top-tier journals differ relatively little whereas more variation is found in the middle and 
bottom portions of the hierarchy. 
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Figure 5: Cardinal differences between Impact Factor and Article Influence 
score. The top ten journals by Impact Factor are shown in the left column. 
The scores are scaled vertically, reflecting their cardinal positions. The 
smallest Impact Factor score is on the bottom, and the highest Impact 
Factor score is on the top. The right column shows the same journals scaled 
by Article Influence. 
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6 Conclusion 



Correlation coefficients can be useful statistical tools. They can help us 
identify some kinds of statistically significant relationships between pairs of 
variables, and they can tell us about the sign (positive or negative) of these 
relationships. One must use considerably greater caution, however, when 
drawing conclusions from the magnitude of correlation coefficients — all the 
more so in the presence of spurious correlates and in the absence of a formal 
hypothesis-testing framework. In particular, we have illustrated that just 
because two metrics have a high correlation — 0.8 or 0.9 or even higher — - 
we cannot safely conclude that they convey the same information, or that 
one has little additional information to tell us beyond what we learn from 
the other. 

Comparative studies of alternative measures can be very useful in choos- 
ing an appropriate bibliometric toolkit. We close with a few suggestions 
for how one might better conduct these sorts of analyses. First, be wary of 
what correlation coefficients say about the relationship of two metrics [2] . 
High correlation does not necessarily mean that two variables provide the 
same information any more than a low correlation means that two variables 
are unrelated. Purchasing power varies wildly despite the high correlation 
between wage and hamburger price in our Big Mac example. At the other 
end of the spectrum, in the chaotic region of the logistic map, successive 
iterates have an immediate algebraic relationship yet a correlation of zero. 

Second, appropriate data visualization can bring out facets of the data 
that are obscured by summary statistics. Different forms of data graphics 
can be better suited for certain tasks; for example the comparison plots such 
as those in Figure [4] better highlight the differences between bibliometric 
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measures than do standard scatter plots. 

Finally, simple observations can be at least as powerful as rote statisti- 
cal calculations in understanding the nature of our data. For example, the 
median of the burgers/hour in the top third of the countries is about five 
times the median of the burgers/hour in the bottom third. This says a great 
deal about the differences in purchasing power across countries. The median 
"bang per cite received" in the top third of journals is almost 2.4 times of 
the median in the bottom third. This says a great deal about the difference 
in how journals are valued under the Eigenfactor metrics, and helps us un- 
derstand why the Eigenfactor metrics offer a substantially different view of 
journal prestige than that which we get from straight citation counts. 
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